Learning Hybrid Representations to Retrieve Semantically Equivalent Questions
نویسندگان
چکیده
Retrieving similar questions in online Q&A community sites is a difficult task because different users may formulate the same question in a variety of ways, using different vocabulary and structure. In this work, we propose a new neural network architecture to perform the task of semantically equivalent question retrieval. The proposed architecture, which we call BOW-CNN, combines a bag-ofwords (BOW) representation with a distributed vector representation created by a convolutional neural network (CNN). We perform experiments using data collected from two Stack Exchange communities. Our experimental results evidence that: (1) BOW-CNN is more effective than BOW based information retrieval methods such as TFIDF; (2) BOW-CNN is more robust than the pure CNN for long texts.
منابع مشابه
Detecting Semantically Equivalent Questions in Online User Forums
Two questions asking the same thing could be too different in terms of vocabulary and syntactic structure, which makes identifying their semantic equivalence challenging. This study aims to detect semantically equivalent questions in online user forums. We perform an extensive number of experiments using data from two different Stack Exchange forums. We compare standard machine learning methods...
متن کاملMultilingual Models for Compositional Distributed Semantics
We present a novel technique for learning semantic representations, which extends the distributional hypothesis to multilingual data and joint-space embeddings. Our models leverage parallel data and learn to strongly align the embeddings of semantically equivalent sentences, while maintaining sufficient distance between those of dissimilar sentences. The models do not rely on word alignments or...
متن کاملSelecting Sentences versus Selecting Tree Constituents for Automatic Question Ranking
Community question answering (cQA) websites are focused on users who query questions onto an online forum, expecting for other users to provide them answers or suggestions. Unlike other social media, the length of the posted queries has no limits and queries tend to be multi-sentence elaborations combining context, actual questions, and irrelevant information. We approach the problem of questio...
متن کاملLearning from Reading Syntactically Complex Biology Texts
This paper concerns learning information by reading natural language texts. The major aim is to develop representations which are understandable by a reasoning engine and can be used to answer questions. We use abduction for mapping natural language data into concise and speci c theories underlying the textual data. Techniques for automatically generating usable data representations are discuss...
متن کاملMultilingual Semantic Parsing : Parsing Multiple Languages into Semantic Representations
We consider multilingual semantic parsing – the task of simultaneously parsing semantically equivalent sentences from multiple different languages into their corresponding formal semantic representations. Our model is built on top of the hybrid tree semantic parsing framework, where natural language sentences and their corresponding semantics are assumed to be generated jointly from an underlyi...
متن کامل